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Abstract 

High dimensional covariance estimation and graphical models is a contemporary topic in statis¬ 
tics and machine learning having widespread applications. The problem is notoriously difficult 
in high dimensions as the traditional estimate is not even positive definite. An important line 
of research in this regard is to shrink the extreme spectrum of the covariance matrix estimators. 
A separate line of research in the literature has considered sparse inverse covariance estimation 
which in turn gives rise to graphical models. In practice, however, a sparse covariance or inverse 
covariance matrix which is simultaneously well-conditioned and at the same time computation¬ 
ally tractable is desired. There has been little research at the confluence of these three topics. In 
this paper we consider imposing a condition number constraint to various types of losses used in 
covariance and inverse covariance matrix estimation. This extends the approach by Won, Lim, 
Kim, and Rajaratnam (2013) on multivariate Gaussian log likelihood. When the loss function 
can be decomposed as a sum of an orthogonally invariant function of the estimate and its inner 
product with a function of the sample covariance matrix, we show that a solution path algorithm 
can be derived, involving a series of ordinary differential equations. The path algorithm is at¬ 
tractive because it provides the entire family of estimates for all possible values of the condition 
number bound, at the same computational cost of a single estimate with a fixed upper bound. 
An important finding is that the proximal operator for the condition number constraint, which 
turns out to be very useful in regularizing loss functions that are not orthogonally invariant and 
may yield non-positive-definite estimates, can be efficiently computed by this path algorithm. 
As a concrete illustration of its practical importance, we develop an operator-splitting algorithm 
that imposes a guarantee of well-conditioning as well as positive definiteness to recently pro¬ 
posed convex pseudo-likelihood based graphical model selection methods (Zhang and Zou, 2014; 
Khare, Oh, and Rajaratnam, 2015). 
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1 Introduction 


We consider the problem of estimating the covariance matrix or its inverse (precision matrix) from 
n independent copies of p-variate random vectors from some distribution. This estimation problem 
is becoming increasingly important in many statistical methods, from least squares repression to 
graphical model selection. Applications include medical image analysis, genomics, and financial 
engineering, to name a few. In some applications (e.g., portfolio optimization, Gauss mixture clus¬ 
tering) overall risk properties of the covariance estimator are important; in others (e.g., graphical 
model selection), the sparsity pattern of the inverse covariance matrix is of critical interest. In any 
situation, the estimator should be symmetric, positive definite to be a valid (inverse) covariance 
matrix. It is also desirable that the ratios between the eigenvalues of the estimator are not too ex¬ 
tremal, in order to reflect that the population covariance matrix describes a proper, non-degenerate 
p-dimensional distribution. In this paper, we call matrices that satisfy both conditions to be stably 
positive definite. 

Unfortunately, however, many estimators of covariance or inverse covariance matrix are not 
positive definite, let alone stably positive definite. It is well known that the sample covariance 
matrix 


S=- 


1 ” 


( 1 ) 


i=l 


where Xj is the ith copy of the random vector, is merely positive semidefinite when n < p. Some 
high-dimensional covariance matrix estimators based on structural sparsity assumptions may fail 


to be positive definite (Fan, Liao, and Mincheva, 2013); high-dimensional sparse inverse covariance 


matrix estimators based on maximum pseudo-likelihood principle (Meinshausen and Biihlmannl 


2006; Peng, Wang, Zhou, and Zhu 2009 Zhao, Rocha, and Yu 2009 Khare, Oh, and Rajaratnam 


2015; Zhang and Zou, 2014) may have negative eigenvalues, sometimes not even symmetric. 


The main subject of study in this paper is the set of positive definite matrices with bounded 
condition numbers. The condition number of a positive definite matrix quantifies its degree of 
invertiblity, and is defined as the ratio of the largest to smallest eigenvalues of the matrix. Thus 
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the set of interest can be formally written, for an upper bound k, 


= {n : n ^ 0, Amax(^^)/Amin(f^) < n} 

= {Q : 3u > 0,ul ^ Q < kuI}, 


where ^ 0 {resp. A ^ 0) denotes that matrix A is positive dehnite {resp. positive semidefinite), 

A < B means that B — A ^ 0, and Aniax(^) and Aniin(^) refers to the maximum and minimum 
eigenvalues of A] the identity matrix is denoted by I. One should note that the set properly en¬ 
codes the notion of stable positive definiteness. If the (inverse) covariance matrix can be estimated 
constrained on C^, then the estimator possesses the desired properties mentioned in the previous 
paragraph. Because Q £ implies that exists and 0“^ G C^, we do not distinguish estimation 
of the covariance matrix and estimation of the inverse covariance matrix too much; for the reason 


that will become apparent in the sequel, we use Q to denote the inverse covariance matrix. Won, 


him, Kim, and Rajaratnam (2013) studied the set as a means to regularize high-dimensional 


Gaussian maximum likelihood covariance estimators. Their motivation is to impose numerical sta¬ 
bility for inversion of the estimates, for instance to use with Markowitz-type portfolio optimization 
problems. In this paper, we see this idea can be extended to a much general class of loss functions. 
Now consider the estimation problem of the form 


minimize L{Q) — Tr (12/(5')) 
subject to n G Ck, 


( 2 ) 


where L(f2) is convex; 5 is the sample covariance matrix and / is a function that maps a 
symmetric matrix to a symmetric matrix of the same dimension. Problem (§ includes many 
interesting cases: 


1. Gaussian log likelihood: L(n) = — logdetfl, /(5) = —5. 


2. Gaussian log likelihood with a-pair-of-nuclear-norms regularization (Ghi and Lange, 2014) 
L{n) = -logdetI2 + ?7(a||f2||* + (l-a)||L!-i||*), /(5) = -5. 
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3. Quadratic loss: L{Q) = (l/2)||r2|||,, f{S) = S. 


4. CONCORD loss ( |Khare et al.l[20l^ : L(D) 

— diag(Oii,..., Opp). 


logdet Dd + (l/2)Tr(DS'0), /(S') = 0, where 


5. D-trace loss (Zhang and Zou, 2014); L(D) 


(l/2)Tr(DSD),/(S) = /. 


Hence a characterization of the solution to (j^ is of an utter interest. Cases 0-1 are distin¬ 
guished from the rest because in these cases L(0) is orthogonally invariant, i.e., L{Q"^QQ) = L(D) 
for any Q such that Q^Q = I, with an additional condition that L{D) = Ym=i h{di), h being closed 
convex, if D = diag(di,..., dp). For instance. 


hiX) = 


— log A, casein 

— log A-|-r/(Q;A-|-(1 — a)A“^), case El 

(1/2)A^, case|3l 


In such cases, we can provide a complete characterization of the solution path of ([^ as the parameter 
K varies from unity to infinity. Furthermore, we show that for many interesting cases, the entire 
solution path can be computed at the same cost (namely, in 0{p) operations) as that of finding the 
solution for a fixed k. Thus the characterization of the solution path provides a huge computational 
advantage in solving (© efficiently. 

Cases 1^ and are pseudo-likelihood losses that arise in high-dimensional graphical model se¬ 
lection. Orthogonal variance of L(D) in these cases prevents a direct application of the method 
mentioned in the previous paragraph. Nevertheless we can show that problem ([^ with these losses 
can be efficiently solved by a scalable, Dykstra’s alternating projection-type operator splitting 


method (Lange, 2013), resulting in a sparse, stably positive definite covariance selection. This 


is because the orthogonal projection of a symmetric matrix to set Ck has an almost closed form 
representation, a result that follows from Section In this sense, case bridges cases and with 
cases IHandlSl 

The rest of the paper is organized as follows. In Section we characterize the solution path 
for the orthogonally invariant cases as soultions of ordinary differential equations with respect 
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to K, introduce an efficient method to solve (§ for all values of k based on this observation. 
Explicit solutions to some cases introduced in this section are also provided. In Section we 
develop an alternating projection algorithm that solves the orthogonally variant cases scalably, and 
demonstrate that the algorithm provides stably positive semidefinite solutions to graphical model 
selection problems, without loosing the desired sparsity. Section concludes this paper. Some 
proofs of the results in the paper are given in the Appendix. 


2 Solution path for orthogonally invariant L{fl) 

We begin with the characterization of the solution to Q for a fixed k. 

Theorem 1. Suppose the spectral decomposition of f{S) is given by VDV^, V'^V = VV'^ = I, D = 
diag(di,..., dp), di > ■ ■ ■ > dp. Then, 0* = VA*V'^ minimizes Q, where A* = diag(A 2 ,..., A*) 
with 


A* = max(u*, min(Ai, Ku*)). (3) 

The Xi is the minimizer of li{\) — diX in A > 0. Let = argmin„ where 

a a p p 

^a,(3i,'^) — ^ ^ lp—i+l{u) U ^ ^ dp—i-\-l A ^ ^ lp—i-\-l(^Ku) KU ^ ^ dp_j-|_l 
i=l 1=1 i=f} i=fi 

for a G p — 1} and (3 G {2, ...,p}. Then u* can be chosen to equal to Ua,p for {a,j3) 

satisfying the relation 

(^q:,/ 3Wa,/3) £ — ^(^u,v) . Ap—Q-i-i < n A Xp—oi,Xp—fj-^-2 A n < Ap_^-|-i}, 


Finding the pair (a, f3) takes 0{p) time. 


The proof is given in Appendix 1. 


Remark 1. This theorem subsumes Won et al. (2013, Theorem 1) that corresponds to case[^ and 
allows f{S) to be indefinite or singular, i.e., d* < 0 for some i. Thus A* = oo or A* = —oo is 
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allowed. 


Remark 2. An insepection of the proof reveals that the problem reduces to determine 

p 

u* = argminy^ li{X*{u)) — diX*{u), 

u>0 ^ 

where X*{u) = max(u, min(Ai, ku)), i.e. a univariate minimization problem. Thus standard uni¬ 
variate optimization methods, e.g., bisection or golden search, can also be employed to find u*, 
subject to a tolerance level. The theorem says that it can be found exactly within 0{p) operations. 

If liS are continuously differentiable, the Ua,p in Theoremcan be found by solving the equation 

OL p a p 

Ip-i+i X] dp-i+i + nY dp-i+i- (4) 

i=l i=p i=l i=^ 

Then the implicit function theorem states that = Ua,/ 3 {K) is a continous function of k. Thus 
if the optimal u* in Q satisihes u*{k) = Ua,p{K) so that Ua,i 3 {K),Va,fB £ for some a,/3, 

where intA denotes the interior of a set A, then a small change in k will not change a or (3, i.e., 
u*{k, + Aac) = Ua^isin + Ak) and {ua,i 3 {n + AK),Va,/3{n + Ak)) G intRa^jS for sufficiently small Ak. 
Thus the local solution path within Ra,i 3 can be traced by solving Q for continuously varying n 
subject to the condition u*{k) G If we further assume that ks are twice differentiable, this 

local path can be completely characterized by an ordinary differential equation: it is straighforward 
to derive 


dUa,l3 Yl^i=pdp-i+l 

from which the curve {u*( k), v*( k)) within Ra,i 3 can be determined. 
Example 1. For case[^ we have 

dua,i3 _ (a + P - ^ + 1) Z]f=/3 Si 
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where Si is the ith largest eigenvalue of S. In this case @ has an explicit solution 



a + p - P + 1 


which satisfies Q. Furthermore, because 



it follows that 


7 \^) 5 

dUa,^ ^i=l3 Si 


which is constant within Ra,j 3 - In other words, the solution path is piecewise linear in the u-v plane. 
Example 2. For case[^ we have 


Yli=0 Sp-i+i - 2{p- + 1 )ku 

a + K^{p - /3 + 1) 


dUajj 

dn 


whose general solution is given by 



Ua^fsin) = K exp 


for some constant K > 0. 

Will the piecewise smooth solution path above be continuous as well? The concern is that at 


the boundary of the rectangle Ra,i 3 where a small change of k indeed alters a and/or /?, there may 


be a jump in the path. The following lemma shows that this will not happen. 

Lemma 1. Suppose for some k with {ua,0iii),Va,f3{h)) GintRa^/s. LetR = sup{K: ^ 

-Rq,/?}- Then the point {ua,/3{R),Va,i3{K)) coincides with either {ua-i,i3{R),Va-i,/3iR)) G Ra-i^js, 
{ua,p+i{R),Va,i 3 +iiK)) G Ra,i 3 +i, or (it),U q,_i,/3+i( it)) G Ra-i,p+i exclusively. 

The proof is given in Appendix 1. 

We have so far seen that the solution path is continuous and piecewise smooth, and how the 
curve pieces can be computed and traced. The remaining task is to determine the initial point 


7 












the path. The initial point can be obviously chosen to the point that corresponds to k = 1, i.e., 
we need to find a and j3 such that (uq,^/3(1), G Rafi- Note in this case that the closure of 

the desired Ra^p should intersect with the line v = u. By construction, this occurs if and only if 
a = P — 1. Then, from @ with K = 1, it follows that 

p p ^ _ I P 

='^^di = pd, where d=-''^^di, (8) 

i=l i=l ^ i=l 

and u*(l) is found by solving this equation. In particular, if Z* = / for i = 1,... ,p, then 

u*(i) = {irHd), 

where is the generalized inverse of I', which exists because I' is nondecreasing. Thus for case 

[^we obtain u*(l) = 1/s, and for casewe have u*(l) = s. 

Combining Lemmaj^and the above discussion, we are ready to fully describe the entire solution 
path, as stated in the following theorem. 

Theorem 2. Ifli, z = 1,... , p, are closed convex and twice differentiable, the lower truncation value 
u*{k) for the optimal eigenvalue (© for problem together with the upper truncation value v{k) = 
ku{k) traces a piecewise smooth path on the u-v plane as the regularization parameter n varies. The 
resulting solution path is given by the solutions of the series of ordinary differential equations Q, 
and its slope is discontinuous only when it intersects the vertical lines u = Ai,..., Ap or horizontal 
lines V = Xi,... ,Xp. The initial point of this path is found by solving Q, corresponding to k = 1. 
This initial point as well as the entire path can be found in 0{p) operations (Algorithm^ . 

Proof. Line 4 of Algorithm takes 0{p) operations. In the loop, either of the conditions in Lines 
11 and 12 must be met for each iteration. Thus for each value of a = 1,2,... ,p, at most one value 
of f G {1,2,... ,p} is considered. This takes 0{p) time. □ 

Remark 3. Algorithm^ terminates if v* = Xp-r+i, where 


Al ^ ^ Xy. ^ A^^^ - ‘ ‘ ‘ — Xp, 
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Algorithm 1 Solution path algorithm for orthogonal L 
1: Set Knew ^ 1 

2: Find Unew = Kew by solving @ 

3: Find a such that Ap_Q,+i < Ung^ < \p-a] set /3 ■(— a + 1 
4: Set K, ■<— {Knew}, 2^ {(ct,/?)} 

5: While (a > 1 and /3 < p) 

6: Compute Ua^/sin) by solving 

7' Set K^ i inf{K ^ Knew ■ — Ap_Q,-|_i} 

8: Set Ky infjK > Knew : KUa,l3{K) = Ap_/ 3 +i} 

9: Set Knew niin(Ku, k„) 

10: Af i — )C U {Knew}, 2i i — X U {(o, /d)} 

11: If Ua,l3i^new) — ^p—a+l then ex i CX 1 

12: If Knew'^a,/3(^new) — Ap_^^i then /3 i j3 \ 

13: Return /C,X 


understanding Ap+i = —oo. This includes the case when S is singular, i.e., 


Sr ^ ^ 7^ d — S7*-|-r — * * * — s^p' 


For case using the fact that the solution path is piecewise linear in the u-v plane, a simple 
geometric algorithm can be devised. This is shown in Algorithm 

Algorithm 2 Solution path algorithm for case[^ 

1: Set Knew ^ 1, ttnew = Kew = !/« 

2: Find a such that la > I > la+i', set /3 •(— a + 1 
3: Set K, ^ {Knew}, Id ^ {u*}, V = {u*} 

4: While (a > 1 and (5 <p) 

5 : -{'ZUk)l{Y:Uli) 

6: Ra,p {(^,'^) • <u< 1/la+i and l/ljs-i <v< l/l^} 

7: {u*,v*) •<— intersection between line passing (Unew,^new) of slope t 

and boundary of Ra,p, with u* < Ung^ 

8: fi^new V*/u*, U^g^ •(— U*, <— V* 

9: AT <— U {Knew}, Id U U {'Wng.j^r}, V •(— V U {Ungw} 

10: If tt* = 1 /sq then a •(— a — 1 

11: If u* = 1/s^ then /3 <— /? + 1 

12: Return/C,f2, V 
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3 Solution procedure for orthogonally variant L{Q) 

With an additional sparsity-incuding penalty, problem (§ can be compactly written 


minimize /ii(0) + 
subject to O G Ck, 


where /ii(Q) = L(0) — Tr(n/(5) and /i 2 (tl) = /i|t2|i = To be specific, 


hi{n) 


— logdet 0/) + (l/2)Tr(n50), caseHl 
(l/2)Tr(0,Sbl)-Tr(bl), caseO 


Problem Q can be equivalently written 


minimize hi{Q) + /i 2 (n) +2c^(n). 


( 9 ) 


where 




0 , 

< 

+ 00 , otherwise. 


is the indictor function of the set C^- Because both hi and /12 are not orthogonally invariant, it is 
not obvious how to handle this spectral constraint set efficiently. The key idea here is to utilize the 
fact that the proximal operator of the indicator function Ic^, that is, the orthogonal projection to 
Ck, is efficiently computed using AlgorithmFor X € E>p, where is the space of pxp symmetric 
matrices, the proximal operator is defined as follows. 


VcAX) = argminXc.(^) + - Xfp, t > 0. 


The optimization problem involved in the right hand side of (10) is 


( 10 ) 


minimize (1/2)||X — A||^ 
subject to X G Ck, 
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i.e., case 


: [^ Thus, Algorithm gives the entire solution to (10) for all k > 1 in 0{p) operations, 
with the smooth pieces has a closed form given in ([^, given the spectral decomposition of X. 

Now ([^ can be solved by using Dykstra’s alternating projection algorithm (Lange, 2013, Ch. 
15): 


Q{k+i/2) argmin/ii(Q) + /i2(Ll) + (1/2)||Q - 

Q(fc+l/2) 


oesp 

2Q{k+l/2) _ Qik) 


( 11 ) 


Qik+l) 


which is an instance of the Douglas-Rachford operator splitting algorithm ( Eckstein and Bertsekas| 
1992); converges is guaranteed if hi{fl) + h 2 {Vt) is closed convex, which holds for casesand 
For case[^ the subproblem © is to solve 


minimize 


- logdet Dd + (l/2)Tr(D(5 + (l/2)/)D) - Tr(DD(^)) + 


which is yet another CONCORD problem. This problem can be efficiently solved via the block 


coordinate descent (Khare et ah, 2015), or proximal gradient methods (Oh, Dalai, Khare, and 
Rajaratnam| |2014l ) . 


For case m© reduces to a lasso program ([Tibshirani] |1996): 


minimize (l/2)Tr(D(S' + (l/2)/)D) — Tr(D(I + + /i|D|i, 


which can again be efficiently solved via proximal gradient methods (Beck and Teboulle, 2009). 


Illustration To illustrate the effect of the condition number regularization, we generated n = 200 
samples from p = 10 dimensional multivariate normal distribution with zero mean and inverse 
covariance matrix D such that Djj = 1 for i = 1,... ,p and D 15 = D 51 = D 26 = ^62 = -99. We 


compared the estimated D obtained using the CONCORD-ISTA algorithm (Oh et ah, 2014) with 
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sparisty level /i = 0.1 and that using the alternating projection algorithm of this section, where the 
upper bound for the condition number is set to 10 and the same CONCORD-ISTA is used for the 
subproblem 0- With the tolerance for the relative change of the estimates set as 1 x 10 ® (the 
meanings of the relative change are not the same between these two, though), the former terminated 
within 1000 iterations, and the latter within 503 iterations, where the inner CONCORD-ISTA is 
ran up to 100 iterations for each outer iteration. Both methods gave a similar sparsity pattern for 
the estimates (Figure [^. However, the inverse covariance matrix obtained using the CONCORD 
loss only is on the vicinity of singularity, with the minimum eigenvalue of 0.0102. The maximum 
eigenvalue was 1.98, giving the condition number of 194. On the other hand, the CONCORD loss 
combined with the condition number regularization yielded the minimum eigenvalue of 0.114, more 
than 10 times greater than the pseudo-likelihood-only counterpart, while the maximum eigenvalue 
was moderately reduced to 1.14. (Thus the condition number bound of 10.0 was retained.) The 
eigenvalue distributions of both cases are shown in Figure 


4 Conclusion 


We have considered imposing a condition number constraint to regularize the estimator of the 
covariance of inverse covariance matrix of a population distribution under various loss criteria. For 
the losses that consists of an orthogonally invariant term and an inner product with a function of the 
sample covariance matrix, the problem reduces essentially that of the eigenvalues of the estimator, 
and the entire solution path with respect to the degree of condition number regularization can be 
obtained. If the involved ordinary differential equation admits a closed form solution, then the 
path can be obtained at the same cost as hnding the estimator for a fixed regularization parameter. 
For other losses, an operator splitting scheme can be employed to find the estimator, hence the 
problem is scalable. At the core of this scheme lies the fact that the projection operator to the set 
of matrices with bounded conditio numbers allows path solutions, due to its orthogonal invariance. 

The most expensive part in computing the solution paths is the spectral decomposition. As 


noted by Chi and Lange (2014), randomized algorithms such as random projection to lower dimen¬ 


sional subspaces may provide a computational relief (Mahoney, 2011). These approaches incurs 
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a small loss in accuracy, thus a possible research direction is to handle inexact solutions to the 
optimization subprolems in the alternating projection algorithm properly. 


Appendix 1 


Proof of Theorem^ First note that both L(n) and are orthogonally invariant, hence depends 
only on the eigenvalues of Q. Suppose the spectral decomposion of O is UAU'^, U'^U = UU'^ = I, 
A = diag(Ai,..., \p), Ai > • • • > Xp. For the trace part of the objective, the von Neumann-Fan 
inequality (Mirsky, 1975; Farrell, 1985[ Lange, 2013, Appendix A.4) asserts that 


Tr{nf{S)) <Tr{AD) = '^Xidi, 


2 = 1 


with equality if and only if V = U. Thus problem Q reduces to a p + 1-variate problem 


minimize YjI=i “ diXi 
subject to u < Xi < Ku, i = 1,... ,p, 
Ai ^ * * * ^ Ap, 


( 12 ) 


where the variables are Ai,..., Ap and u. The last order constraint can be removed, because of 


the following. Without the order constraint, for a fixed n > 0, the reduced problem (12) becomes 
separable in A^; it suffices to solve 


minimize li{Xi) — diXi 
subject to u < Xi < Ku 


(13) 


for each i = 1,... ,p. Convexity of the objective in (13) ensures that the minimum is attained at 


X*{u) = max(u, min(Aj, Ku)), 
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where Aj = argmin^^ ki^) — diX. The optimality condition for A* is given by 


di G dli{Xi) Xi G dg*{di), 


where df{x) denotes the subdifferential of / at x, and g*{v) = sup(A, u) — g{X), the convex conjuate 
of g{X). Monotoniciy of the subdifferential operator ensures that AjS perserve the order of djS, i.e., 


Ai > • • • > Xp. It follows that AJ(m) > • • • > A*(rt), hence (12) reduces to a univariate minimization 
problem over u 


minimize k{X*(u)) — diX*(u) 


(14) 


Z=1 


The solution to (14), u*, must satisify 




= 


u*, 


a*. 

Xi, 

i = a* + l 


KU*, 

i = 13*,... 

,P, 


where a* and /3* are such that Ap_Q,*+i < u < Xp-a* and Ap _^*_|_2 < nu* < Ap_^*_|_i. To find m*, 
for a G {I,... ,p — 1} and jd G {2,... ,p}, define 




(u) 


u, i = 1,... ,a, 

' Xi, i = a + l,...,(d -1, 
Ku, i = l3,...,p, 


and 


Wrv /3 = argmin^/p_i+i(A"f._^^(n)) - dp_i+iA"f._^^(w) = argmin 


2=1 


By construction, coincides with if and only if 


Q+l ^ q ;,/3 — ^p—a and Ap_^_|_2 ^ ^ 


(15) 
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or iua,fs, Krto,/?) £ Ra, 0 - Because partition the u-v plane into (p+2)^ regions and {ua,i 3 , i^Ua^p) 

is on the line v = ku, an obvious algorithm to find the pair (a, /3) that satisfies the condition (15) is 


to keep track of the rectangles Ra,i 3 that intersect this line. To see that this algorithm takes 0{p) 
operations, start from the origin of the u-v plane, increase u and v along the line v = ku. Since 
K > 1, if the line intersects Ra,i 3 , then the next intersection occurs in one of the three rectangles: 
Ra+i^p, Ra,i 3 +i, and Ra+i,i 3 +i. Therefore after finding the first intersection (which is on the line 


u = Ai), the search requires at most 2p tests to satisfy condition (15). Finding the first intersection 
takes at most p tests. □ 


Proof of Lemma^ Increase k from R. Suppose the curve passing the point 

meets the left side (but not inclusive) {(u, u) : u = Ap_a+i} of Ra,i 3 before it meets the upper side 
(also not inclusive) {{u,v) : v = Ap_/ 3 +i}. Then, taking the limit of both sides of (|^ as k k, and 
by continuity of Ua,i 3 {K), we have 


a p a p 

^p-i+i(^p-Q+i) ~k k ^ ^ /p_j_|_]^(kAp_Q,-|-i) = dp-i^i + k ^ ^ dp-i^i- ( 16 ) 

i=l i=p i=l i=l3 


Optimality of Xp-a+i (see ([T^) and continuity of asserts that 


^p-i+li^p—a+i) ~ dp—a+1- 


Thus (16) is equivalent to 


0—1 p 0—1 p 

^ ^ ^p—i+i(^p—«+i) ~k k ^ ^ /p_j_[_]^(kAp—g-i-i) = ^ ^ dp—i^i + k ^ ^ dp—i-\-i. 
i=l i=0 i=l i=j3 


In other words. 


Ap—0+1 — 'i^o—1,/3(^) 


and (ua,/3(K), Uq,^^(k)) = (u^-i,/?(«:), Ua_i^^(K)) G Ra-i^/s- If the curve meets the upper side before 
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the left side of -Rq,/?, we have 


ol p a p 

'y ^ /3+i/^)+^ ^p—i+i(^p—/3+i) = y ^ dp-i-i-i +/t ^ ^ dp-jj^i, 

i=l i=0 i=l i=0 

hi^p—fi+l) ~ / 3 + 1 ) 


and thus 

a p a p 

^p-i+l(^p-/3+l/^) + ^ ^p-i+l(^p-/3+l) = y ^ tip-i+l + ^ dp-i-^-l 

i=l i=l3+l i=l ^=/3+l 

to have Uq,^/ 3 (k)) = (itQ,^/ 3 +i(K),Ua^^+i(K)) G Ra,/ 3 +i- The final case, that the curve 

meets the upper left corner of Ra,i 3 , is the combination of previous two cases, and it follows that 
(.Ua,^{K),Va,p{K)) = {Ua-l,l3+l{K),Va-l,l3+l{K)) G Ra-1,^+1- □ 
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(b) CONCORD estimate 
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(c) CONCORD estimate with an upper bound on condition number 


Figure 1: Rlustration of the effect of the condition number regularization on the CONCORD 
pseudo-likelihood graphical model section. 
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Figure 2: Distribution of the eigenvalues of the CONCOND-only inverse covariance matrix estimate 
(x), and CONCORD with condition number regularization (+). 
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